Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Word matching using single closed contours for indexing handwritten historical documents

Identifieur interne : 000F18 ( Main/Exploration ); précédent : 000F17; suivant : 000F19

Word matching using single closed contours for indexing handwritten historical documents

Auteurs : Tomasz Adamek [Irlande (pays)] ; Noel E. O'Connor [Irlande (pays)] ; Alan F. Smeaton [Irlande (pays)]

Source :

RBID : Pascal:07-0469287

Descripteurs français

English descriptors

Abstract

Effective indexing is crucial for providing convenient access to scanned versions of large collections of historically valuable handwritten manuscripts. Since traditional handwriting recognizers based on optical character recognition (OCR) do not perform well on historical documents, recently a holistic word recognition approach has gained in popularity as an attractive and more straightforward solution (Lavrenko et al. in proc. document Image Analysis for Libraries (DIAL'04), pp. 278-287,2004). Such techniques attempt to recognize words based on scalar and profile-based features extracted from whole word images. In this paper, we propose a new approach to holistic word recognition for historical handwritten manuscripts based on matching word contours instead of whole images or word profiles. The new method consists of robust extraction of closed word contours and the application of an elastic contour matching technique proposed originally for general shapes (Adamek and O'Connor in IEEE Trans Circuits Syst Video Technol 5:2004). We demonstrate that multiscale contour-based descriptors can effectively capture intrinsic word features avoiding any segmentation of words into smaller subunits. Our experiments show a recognition accuracy of 83%, which considerably exceeds the performance of other systems reported in the literature.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Word matching using single closed contours for indexing handwritten historical documents</title>
<author>
<name sortKey="Adamek, Tomasz" sort="Adamek, Tomasz" uniqKey="Adamek T" first="Tomasz" last="Adamek">Tomasz Adamek</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Centre for Digital Video Processing, Dublin City University</s1>
<s2>Dublin</s2>
<s3>IRL</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Irlande (pays)</country>
<wicri:noRegion>Dublin</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="O Connor, Noel E" sort="O Connor, Noel E" uniqKey="O Connor N" first="Noel E." last="O'Connor">Noel E. O'Connor</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Centre for Digital Video Processing, Dublin City University</s1>
<s2>Dublin</s2>
<s3>IRL</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Irlande (pays)</country>
<wicri:noRegion>Dublin</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Smeaton, Alan F" sort="Smeaton, Alan F" uniqKey="Smeaton A" first="Alan F." last="Smeaton">Alan F. Smeaton</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Centre for Digital Video Processing, Dublin City University</s1>
<s2>Dublin</s2>
<s3>IRL</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Irlande (pays)</country>
<wicri:noRegion>Dublin</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">07-0469287</idno>
<date when="2007">2007</date>
<idno type="stanalyst">PASCAL 07-0469287 INIST</idno>
<idno type="RBID">Pascal:07-0469287</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000322</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000464</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000260</idno>
<idno type="wicri:doubleKey">1433-2833:2007:Adamek T:word:matching:using</idno>
<idno type="wicri:Area/Main/Merge">000F31</idno>
<idno type="wicri:Area/Main/Curation">000F18</idno>
<idno type="wicri:Area/Main/Exploration">000F18</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Word matching using single closed contours for indexing handwritten historical documents</title>
<author>
<name sortKey="Adamek, Tomasz" sort="Adamek, Tomasz" uniqKey="Adamek T" first="Tomasz" last="Adamek">Tomasz Adamek</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Centre for Digital Video Processing, Dublin City University</s1>
<s2>Dublin</s2>
<s3>IRL</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Irlande (pays)</country>
<wicri:noRegion>Dublin</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="O Connor, Noel E" sort="O Connor, Noel E" uniqKey="O Connor N" first="Noel E." last="O'Connor">Noel E. O'Connor</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Centre for Digital Video Processing, Dublin City University</s1>
<s2>Dublin</s2>
<s3>IRL</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Irlande (pays)</country>
<wicri:noRegion>Dublin</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Smeaton, Alan F" sort="Smeaton, Alan F" uniqKey="Smeaton A" first="Alan F." last="Smeaton">Alan F. Smeaton</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Centre for Digital Video Processing, Dublin City University</s1>
<s2>Dublin</s2>
<s3>IRL</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Irlande (pays)</country>
<wicri:noRegion>Dublin</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
<imprint>
<date when="2007">2007</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Annotation</term>
<term>Character recognition</term>
<term>Document analysis</term>
<term>Image analysis</term>
<term>Image processing</term>
<term>Indexing</term>
<term>Manuscript character</term>
<term>Multiscale method</term>
<term>Natural language</term>
<term>Optical character recognition</term>
<term>Pattern extraction</term>
<term>Performance evaluation</term>
<term>Segmentation</term>
<term>Video signal</term>
<term>Word</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Indexation</term>
<term>Caractère manuscrit</term>
<term>Reconnaissance optique caractère</term>
<term>Reconnaissance caractère</term>
<term>Mot</term>
<term>Langage naturel</term>
<term>Analyse documentaire</term>
<term>Analyse image</term>
<term>Traitement image</term>
<term>Signal vidéo</term>
<term>Evaluation performance</term>
<term>Annotation</term>
<term>Extraction forme</term>
<term>Méthode échelle multiple</term>
<term>Segmentation</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Effective indexing is crucial for providing convenient access to scanned versions of large collections of historically valuable handwritten manuscripts. Since traditional handwriting recognizers based on optical character recognition (OCR) do not perform well on historical documents, recently a holistic word recognition approach has gained in popularity as an attractive and more straightforward solution (Lavrenko et al. in proc. document Image Analysis for Libraries (DIAL'04), pp. 278-287,2004). Such techniques attempt to recognize words based on scalar and profile-based features extracted from whole word images. In this paper, we propose a new approach to holistic word recognition for historical handwritten manuscripts based on matching word contours instead of whole images or word profiles. The new method consists of robust extraction of closed word contours and the application of an elastic contour matching technique proposed originally for general shapes (Adamek and O'Connor in IEEE Trans Circuits Syst Video Technol 5:2004). We demonstrate that multiscale contour-based descriptors can effectively capture intrinsic word features avoiding any segmentation of words into smaller subunits. Our experiments show a recognition accuracy of 83%, which considerably exceeds the performance of other systems reported in the literature.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Irlande (pays)</li>
</country>
</list>
<tree>
<country name="Irlande (pays)">
<noRegion>
<name sortKey="Adamek, Tomasz" sort="Adamek, Tomasz" uniqKey="Adamek T" first="Tomasz" last="Adamek">Tomasz Adamek</name>
</noRegion>
<name sortKey="O Connor, Noel E" sort="O Connor, Noel E" uniqKey="O Connor N" first="Noel E." last="O'Connor">Noel E. O'Connor</name>
<name sortKey="Smeaton, Alan F" sort="Smeaton, Alan F" uniqKey="Smeaton A" first="Alan F." last="Smeaton">Alan F. Smeaton</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000F18 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000F18 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:07-0469287
   |texte=   Word matching using single closed contours for indexing handwritten historical documents
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024